Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
[37m── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──[39m
[37m[32m✓[37m [34mggplot2[37m 3.2.1 [32m✓[37m [34mpurrr [37m 0.3.3
[32m✓[37m [34mtibble [37m 2.1.3 [32m✓[37m [34mdplyr [37m 0.8.3
[32m✓[37m [34mtidyr [37m 1.0.0 [32m✓[37m [34mstringr[37m 1.4.0
[32m✓[37m [34mreadr [37m 1.3.1 [32m✓[37m [34mforcats[37m 0.4.0[39m
[37m── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()[39m
library(lubridate)
Attaching package: ‘lubridate’
The following object is masked from ‘package:base’:
date
Here is our first task:
The project goal is to identify patients seen for drug overdose, determine if they had an active opioid at the start of the encounter, and if they had any readmissions for drug overdose.
Your task is to assemble the study cohort by identifying encounters that meet the following criteria:
Sounds great. Let’s start by taking a look at the data.
allergies <- read_csv("datasets/allergies.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m
)
allergies
encounters <- read_csv("datasets/encounters.csv")
Parsed with column specification:
cols(
Id = [31mcol_character()[39m,
START = [34mcol_datetime(format = "")[39m,
STOP = [34mcol_datetime(format = "")[39m,
PATIENT = [31mcol_character()[39m,
PROVIDER = [31mcol_character()[39m,
ENCOUNTERCLASS = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
encounters
medications <- read_csv("datasets/medications.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
DISPENSES = [32mcol_double()[39m,
TOTALCOST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
medications
patients <- read_csv("datasets/patients.csv")
Parsed with column specification:
cols(
.default = col_character(),
BIRTHDATE = [34mcol_date(format = "")[39m,
DEATHDATE = [34mcol_date(format = "")[39m,
ZIP = [32mcol_double()[39m
)
See spec(...) for full column specifications.
patients
procedures <- read_csv("datasets/procedures.csv")
Parsed with column specification:
cols(
DATE = [34mcol_date(format = "")[39m,
PATIENT.x = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE.x = [31mcol_character()[39m,
DESCRIPTION.x = [31mcol_character()[39m,
COST.x = [32mcol_double()[39m,
REASONCODE.x = [32mcol_double()[39m,
REASONDESCRIPTION.x = [31mcol_character()[39m
)
procedures
Ok, we are chiefly interested in the encounters table, and basically want to filter it based on the specifications given in the task. Let’s start by filtering the encounters by drug overdose. Looking at the data dictionary sheet for the encounters table, we can see that the REASONCODE column are SNOMED-CT codes.
We can lookup the code for a drug overdose here: https://browser.ihtsdotools.org/, which has the code as 55680006.
drug_overdoses <- filter(encounters, REASONCODE == 55680006)
drug_overdoses
Great, now we just need to filter for encounters that occur after July 15, 1999.
The encounters table has two column that represent the date of the encounter. START and STOP, further clarification would be neccessary to determine if the task is to find encounters that begin after 07/15/1999 or end at that date. For the purposes of this exercise, we’ll go with encounters that begin after that date due to the term occur in the specification.
after_date <- filter(drug_overdoses, START > "1999-07-15")
arrange(after_date, START)
Now we’re concerned with encounters with patients between the ages of 18 and 35; we’ll need to join the patients table to handle that.
with_patients <- inner_join(after_date, patients, c("PATIENT" = "Id"))
with_patients
Based upon the wording in the specifications, the patient’s age must be greater than or equal to 18 at the start of an encounter and less than or equal to 35 at the end of the encounter.
Let’s make sure that there are no encounters in our table that has not ended, because a patient could age to 36 by the time the encounter is over.
not_ended <- drop_na(with_patients, STOP)
not_ended
Turns out we’re ok. Let’s do the filtering now. First we’ll need to calculate the age of the patient at the start and end of the encounter.
age <- mutate(not_ended, AGEATSTART = as.period(interval(BIRTHDATE, START))$year)
age <- mutate(age, AGEATSTOP = as.period(interval(BIRTHDATE, STOP))$year)
select(age, Id, AGEATSTART, AGEATSTOP)
aged <- filter(age, AGEATSTART >= 18 & AGEATSTOP <= 35)
aged
That finishes up the first task.
With your drug overdose encounter, create the following indicators:
DEATH_AT_VISIT_IND: 1 if patient died during the drug overdose encounter, 0 if the patient died at a different timeCOUNT_CURRENT_MEDS: Count of active medications at the start of the drug overdose encounterCURRENT_OPIOID_IND: 1 if the patient had at least one active medication at the start of the overdose encounter that is on the Opioids List (provided below), 0 if notREADMISSION_90_DAY_IND: 1 if the visit resulted in a subsequent drug overdose readmission within 90 days, 0 if notREADMISSION_30_DAY_IND: 1 if the visit resulted in a subsequent drug overdose readmission within 30 days, 0 if not overdose encounter, 0 if notFIRST_READMISSION_DATE: The date of the index visit’s first readmission for drug overdose. Field should be left as N/A if no readmission for drug overdose within 90 daysOpioids List: * Hydromorphone 325Mg * Fentanyl – 100 MCG * Oxycodone-acetaminophen 100 Ml
Ok, looking at the data, it seems the only field we have to infer death on is in the patients table with the DEATHDATE column. If the date falls within the encounter dates, then we’ll mark it 1. The specifications don’t state what to do if the patient hasn’t died, this would need clarification, but for the purposes of this exercise we’ll leave it blank in those cases.
died <- mutate(aged, DEATH_AT_VISIT_IND = as.integer(DEATHDATE >= START & DEATHDATE <= STOP))
select(died, START, STOP, DEATHDATE, DEATH_AT_VISIT_IND)
For COUNT_CURRENT_MEDS we’ll have to used the medications table.
For CURRENT_OPOID_IND we’ll have to lookup the codes for the opoids in question; however, the codes in the medications table do not appear to match up with results found on: https://mor.nlm.nih.gov/RxNav/ (which is the RxNorm database the data dictionary mentioned). We would need clarification on this, but for this exercise we’ll search by the DESCRIPTION column instead.
Some drugs have multiple components/ingredients. It’s unsure whether we only should concern ourselves with the pure drugs of interest or also these. For example: Amlodipine 5 MG / Fentanyl 100 MCG / Olmesartan medoxomil 20 MG vs Fentanyl 100 MCG. We would need clarification, but for the exercise we’ll only examine the pure drugs because multiple ingredients can modulate the effects of the drug in question, this is of course an assumption; and it’s a little closer to the specification.
opoids = c("Hydromorphone 325 MG", "Fentanyl 100 MCG", "Oxycodone-acetaminophen 100ML")
get_meds <- function(start, pt) {
filter(medications, PATIENT == pt & START <= start & STOP <= start)
}
current_meds <- died %>%
mutate(CURRENTMEDS = pmap(list(START, PATIENT), get_meds)) %>%
mutate(COUNT_CURRENT_MEDS = map_int(CURRENTMEDS, nrow)) %>%
mutate(CURRENT_OPOID_IND = map_int(CURRENTMEDS, function(med) any(med$DESCRIPTION %in% opoids)))
select(current_meds, CURRENTMEDS, COUNT_CURRENT_MEDS, CURRENT_OPOID_IND)
Now we can begin looking at readmissions. Let’s calculate how many days it will be to the next encounter.
readmission <- current_meds %>%
arrange(START) %>%
group_by(PATIENT) %>%
mutate(READMISSION = as.period(lead(START) - STOP)$day) %>%
mutate(FIRST_READMISSION_DATE = lead(START)) %>%
ungroup
readmission$READMISSION
[1] NA NA 5476 2213 NA NA 3634 NA NA NA NA NA NA 5047 5023 NA 3807 NA NA NA NA
[22] 3069 NA 3694 NA 183 NA 2783 3966 739 NA NA 2691 NA NA NA 278 2669 547 NA 3481 NA
[43] 3781 2098 NA NA NA 1680 3767 NA 2414 NA 2431 2429 938 1922 NA NA NA 2435 1681 2592 2919
[64] NA 1239 4060 1437 NA 1194 1440 1156 NA NA NA 2653 1430 2453 927 NA 2872 186 1268 NA 854
[85] 2262 NA 1023 NA 290 1410 1939 NA 789 NA 308 524 1082 NA 19 640 NA NA 1229 349 1040
[106] 244 93 113 2600 415 1061 403 626 531 465 1342 1455 2238 1441 118 NA NA 309 909 2264 244
[127] 636 1164 2492 NA 89 56 NA 180 749 NA NA 79 1691 NA 231 624 1472 NA 305 640 577
[148] NA NA NA 100 735 NA 419 NA 514 NA 2300 NA 1440 232 NA 727 222 744 NA 999 219
[169] 1283 917 877 NA 360 NA NA 734 1646 314 631 NA 421 437 506 522 NA 1663 475 NA NA
[190] NA 424 259 60 NA 418 1671 2406 383 852 1638 1363 1713 NA NA NA 580 496 476 70 295
[211] 768 NA 761 2328 1157 1146 206 NA NA 255 1603 130 NA 1426 308 665 NA NA 165 NA 417
[232] 268 768 413 1078 NA 482 NA NA 364 NA 261 NA 435 338 NA 2064 752 120 1703 NA NA
[253] 762 548 473 NA 363 NA 1143 NA NA NA NA 1542 NA 413 920 1176 NA 312 1246 NA 300
[274] 1284 NA 996 NA NA 305 1329 NA NA 127 497 1084 257 1597 947 1304 1421 418 NA 1317 NA
[295] 82 1411 331 NA NA 24 1141 1126 NA 561 NA 1301 431 1401 NA NA NA NA 1328 166 NA
[316] 351 NA 250 NA NA NA NA 342 387 NA NA 130 94 NA 664 NA NA NA NA 501 698
[337] NA 20 NA 103 NA 381 NA 380 289 NA NA 261 NA 83 NA 25 NA NA NA NA NA
[358] NA NA NA 370 NA NA NA 308 NA 133 NA NA 244 NA 212 NA NA NA NA 335 NA
[379] NA NA NA 256 NA NA 232 74 NA NA NA NA 93 NA NA 186 NA 152 NA NA 307
[400] NA NA 62 NA NA NA NA 62 NA NA NA 151 NA NA 43 NA 34 NA NA NA NA
[421] NA NA NA NA NA NA NA NA NA NA NA
Great, now we can add the READMISSION_90_DAY_IND and READMISSION_30_DAY_IND columns.
filtered_readmission <- readmission %>%
mutate(READMISSION_90_DAY_IND = as.integer(READMISSION <= 90)) %>%
mutate(READMISSION_30_DAY_IND = as.integer(READMISSION <= 30))
arrange(select(filtered_readmission, READMISSION, READMISSION_90_DAY_IND, READMISSION_30_DAY_IND), READMISSION)
Earlier when figuring out how many days there was till a readmission, we also put in what the readmission date is. We can use that here. This works because before doing that, we sorted the group dataframes by START. This way the next date is guaranteed to be the first readmission after a specific encounter.
#first_readmission_dated <- mutate(filtered_readmission, FIRST_READMISSION_DATE = as_datetime(ifelse(READMISSION_90_DAY_IND == 1, READMISSIONDATE, NA)))
first_readmission_dated <- mutate(filtered_readmission, FIRST_READMISSION_DATE = replace(FIRST_READMISSION_DATE, READMISSION_90_DAY_IND == 0, NA))
arrange(select(first_readmission_dated, READMISSION, READMISSION_90_DAY_IND, READMISSION_30_DAY_IND, FIRST_READMISSION_DATE), READMISSION)
And now we can move onto the final task.
CSV fileExport a dataset containing these required fields:
| Field name | Field Description | Data Type |
|---|---|---|
PATIENT_ID |
Patient identifier | Character String |
ENCOUNTER_ID |
Visit identifier | Character string |
HOSPITAL_ENCOUNTER_DATE |
Beginning of hospital encounter date | Date/time |
AGE_AT_VISIT |
Patient age at admission | Num |
DEATH_AT_VISIT_IND |
Indicator if the patient died during the drug overdose encounter. Leave N/A if patient has not died, |
0 /1 |
COUNT_CURRENT_MEDS |
Count of active medications at the start of the drug overdose encounter | Num |
CURRENT_OPIOID_IND |
if the patient had at least one active medication at the start of the overdose encounter that is on the Opioids List (provided below) | 0/1 |
READMISSION_90_DAY_IND |
Indicator if the visit resulted in a subsequent readmission within 90 days | 0/1 |
READMISSION_30_DAY_IND |
Indicator if the visit resulted in a subsequent readmission within 30 days | 0/1 |
FIRST_READMISSION_DATE |
Date of the first readmission for drug overdose within 90 days. Leave N/A if no readmissions for drug overdose within 90 days. |
Date/time |
This answers our question about DEATH_AT_VISIT_IND earlier. Our assumption was correct to leave it as N/A if the patient has not died.
Now we just need to grab these fields and write it to a CSV.
output <- first_readmission_dated %>%
select(PATIENT, Id, START, AGEATSTART, DEATH_AT_VISIT_IND, COUNT_CURRENT_MEDS, CURRENT_OPOID_IND, READMISSION_90_DAY_IND, READMISSION_30_DAY_IND, FIRST_READMISSION_DATE) %>%
rename(PATIENT_ID = PATIENT, ENCOUNTER_ID = Id, HOSPITAL_ENCOUNTER_DATE = START, AGE_AT_VISIT = AGEATSTART)
output